4-5/5/2021
Research Workflows
Pipeline
Workflow
R and RStudioR)tidyverseggplot2RMarkdownR?R is:
R languageWhy use R?
RStudio?Please start RStudio
RStudio is an integrated development environment (IDE)R (console/‘scratchpad’); Graphics/visualisation/HelpExcel?”Excel is good for some thingsR is excellent for analysis and reproducibility…R can be run on supercomputers, with extremely large datasets…RStudio overview - INTERACTIVE DEMOVariables are like named boxes
Name)x <- 1 / 40 x
## [1] 0.025
x ^ 2
## [1] 0.000625
log(x)
## [1] -3.688879
name <- "Samia" name
## [1] "Samia"
Variable names are documentation
current_temperature = 28.6 subjectID = "GCF_00001236452.1" GPS_Location = "54N, 36E"
[a-zA-z0-9_.])x2 is allowed, 2x is not)Weight is not the same as weight)lower_snake, UPPER_SNAKE, lowerCamelCase, UpperCamelCaseFunctions (log(), sin() etc.) ≈ “canned script”
sqrt(), lm(), plot())RINTERACTIVE DEMO
args(fname) # arguments for fname
?fname # help page for fname
help(fname) # help page for fname
??fname # any mention of fname
help.search("text") # any mention of "text"
vignette(fname) # worked examples for fname
vignette() # show all available vignettes
What will be the value of each variable after each statement in the following program?
mass <- 47.5 age <- 122 mass <- mass * 2.3 age <- age - 20
mass = 47.5, age = 102mass = 109.25, age = 102mass = 47.5, age = 122mass = 109.25, age = 122USE CHALLENGE LINK ON ETHERPAD
RTHERE IS NO ONE TRUE WAY (only principles)
data?)clean_data?)RStudioRStudio tries to help you manage your projects
R Project concept - files and subdirectory structureRStudioLet’s create a project in RStudio
INTERACTIVE DEMO
RStudio projects: https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects
RStudioWe can write code in several ways in RStudio
We’re going to create a new dataset and R script.
INTERACTIVE DEMO
RStudioDownload the file from the following link to your data/ directory, and extract it
(the link is also available on the course Etherpad page)
Data files can be inspected in RStudio
read.csv(file = "data/inflammation-01.csv", header = FALSE)
Someone gives you a data file that has:
,) as the decimal point character;) as the field separatorHow would you open it, using read.csv()
Use the help function and documentation
INTERACTIVE DEMO
[][row, column]data[1, 1] # First value in dataset data[30, 20] # Middle value of dataset
: separator (meaning ‘to’)data[1:4, 1:4] # rows 1 to 4; columns 1 to 4
data[5, ] # row 5 data[, 16] # column 16
INTERACTIVE DEMO
R provides useful functions to summarise datamax(data) # largest value in dataset max(data[2, ]) # largest value for row (patient) 2 min(data[, 7]) # smallest value on column (day) 7 mean(data[, 7]) # mean value on day 7 sd(data[, 7]) # standard deviation of values on day 7
INTERACTIVE DEMO
Computers exist to do tedious things for us
So apply a function (mean) to each row in the data:
R has several ways to automate this process
apply(X = data, MARGIN = 1, FUN = mean)
MARGIN = 1: rowsMARGIN = 2: columnsrowMeans(data) colMeans(data)
“The purpose of computing is insight, not numbers.” - Richard Hamming
R has many available graphics packages
INTERACTIVE DEMO
plot(avg_inflammation_patient) max_day_inflammation <- apply(dat, 2, max) plot(max_day_inflammation) plot(apply(dat,2,min)) # 3 functions in one!
Can you add plots to your script showing:
R